Optimizing semantic granularity for NLP - report on a lexicographic experiment
نویسندگان
چکیده
Experiments with semantic annotation based on the Corpus pattern Analysis and the lexical resource PDEV (Hanks and Pustejovsky, 2005), revealed a need of an evaluation measure that would identify the optimum relation between the semantic granularity of the semantic categories in the description of a verb and the reliability of the annotation expressed by the interannotator agreement (IAA). We have introduced the Reliable Information Gain (RG), which computes this relation for each tag selected by the annotators and relates it to the entry as a whole, suggesting merges of unreliable tags whenever it would increase the information gain of the entire tagset (the number of semantic categories in an entry). The merges suggested in our 19-verb sample correspond with common sense. One of the possible applications of this measure is quality management of the entries in a lexical resource.
منابع مشابه
A New Method for Improving Computational Cost of Open Information Extraction Systems Using Log-Linear Model
Information extraction (IE) is a process of automatically providing a structured representation from an unstructured or semi-structured text. It is a long-standing challenge in natural language processing (NLP) which has been intensified by the increased volume of information and heterogeneity, and non-structured form of it. One of the core information extraction tasks is relation extraction wh...
متن کاملConverting Large On-Line Valency Dictionaries For NLP Applications: From Proton Descriptions To Metal Frames
0. Abstract In this paper, we report on a large-scale conversion experiment with on-line valency dictionaries. A linguistically motivated valency dictionary in Prolog is converted into a valency dictionary for a large-scale machine translation system. Several aspects of the two dictionaries and their backgroand projects are discussed, as well as the way their representations are mapped. "/'he r...
متن کاملExtracting a Semantic Lexicon of French Adjectives from a Large Lexicographic Dictionary
We present a rule-based method to automatically create a large-coverage semantic lexicon of French adjectives by extracting paradigmatic relations from lexicographic definitions. Formalized adjectival resources are, indeed, scarce for French and they mostly focus on morphological and syntactic information. Our objective is, therefore, to contribute enriching the available set of resources by ta...
متن کاملA study of polysemy judgements and inter-annotator agreement
This paper describes two experiments on polysemy judgement and sense annotation. The first experiment enabled us to select the most polysemous words which were used in the second experiment, and which serve as test words for the evaluation of WSD systems. We show that this selection method yields results different from selecting words on the basis of their number of senses in a dictionary, and ...
متن کاملNLP lexicons: innovative constructions and usages for machines and humans
Lexical resources have undergone significant changes with the generalized use of computers and the advent of the Internet. However, while such changes stand for revolutions when it comes to compare machine-readable dictionaries to their paper 'ancestors', machine-readable dictionaries, compiled for human readers, still have serious limitations. Natural language processing lexicons, initially de...
متن کامل